35 research outputs found

    Weakly-Supervised Temporal Localization via Occurrence Count Learning

    Get PDF
    We propose a novel model for temporal detection and localization which allows the training of deep neural networks using only counts of event occurrences as training labels. This powerful weakly-supervised framework alleviates the burden of the imprecise and time-consuming process of annotating event locations in temporal data. Unlike existing methods, in which localization is explicitly achieved by design, our model learns localization implicitly as a byproduct of learning to count instances. This unique feature is a direct consequence of the model's theoretical properties. We validate the effectiveness of our approach in a number of experiments (drum hit and piano onset detection in audio, digit detection in images) and demonstrate performance comparable to that of fully-supervised state-of-the-art methods, despite much weaker training requirements.Comment: Accepted at ICML 201

    Poisson-Binomial counting for learning prediction sparsity

    Get PDF
    The loss function is an integral component of any successful deep neural network training; it guides the optimization process by reducing all aspects of a model into a single number that must best capture the overall objective of the learning. Recently, the maximum-likelihood parameter estimation principle has grown to become the default framework for selecting loss functions, hence resulting in the prevalence of the cross-entropy for classification and the mean-squared error for regression applications (Goodfellow et al., 2016). Loss functions can however be tailored further to convey prior knowledge about the task or the dataset at hand to the training process (e.g., class imbalances (Huang et al., 2016a; Cui et al., 2019), perceptual consistency (Reed et al., 2014), and attribute awareness (Jiang et al., 2019)). Overall, by designing loss functions that account for known priors, a more targeted supervision can be achieved with often improved performance. In this work, we focus on the ubiquitous prior of prediction sparsity, which underlines many applications that involve probability estimation. More precisely, while the iterative nature of gradient descent learning often requires models to be able to continuously reach any probability estimates between 0 and 1 during training, the optimal solution to the optimization problem (w.r.t. the groundtruth) is often sparse with clear-cut probabilities (i.e., either converging towards 1 or 0). For instance, in object detection, the decision that must be made by the models to either keep or discard estimated bounding-boxes for final predictions (e.g., non-maximum suppression) is binary. Similarly, in music onset detection, the optimal predictions are sparse: it is known that only a few points in time should be assigned a high likelihood, while no probability mass should be allocated to all other timesteps. In these applications, incorporating this important prior directly in the training process through the design of the loss function would offer a more tailored supervision, that better captures the underlying objective. To that effect, this work introduces a novel loss function that relies on instance counting to achieve prediction sparsity. More precisely, as shown in the theoretical part of this work, modeling occurrence counts as a Poisson-binomial distribution results in a differentiable training objective that has the unique intrinsic ability to converge probability estimates towards sparsity. In this setting, sparsity is thus not attained through an explicit sparsity-inducing operation, but is rather implicitly learned by the model as a byproduct of learning to count instances. We demonstrate that this cost function can be leveraged as a standalone loss function (e.g., for the weakly-supervised learning of temporal localization) as well as a sparsity regularization in conjunction with other more targeted loss functions to enforce sparsity constraints in an end-to-end fashion. By design, the proposed approach finds use in the many applications where the optimal predictions are known to be sparse. We thus prove the validity of the loss function on a wide array of tasks including weakly-supervised drum detection, piano onset detection, single-molecule localization microscopy, and robust event detection in videos or in wearable sensors time series. Overall, the experiments conducted in this work not only highlight the effectiveness and the relevance of Poisson-binomial counting as a means of supervision, but also demonstrate that integrating prediction sparsity directly in the learning process can have a significant impact on generalization capability, noise robustness, and detection accuracy

    Weakly-supervised temporal localization via occurrence count learning

    Get PDF
    We propose a novel model for temporal detection and localization which allows the training of deep neural networks using only counts of event occurrences as training labels. This powerful weakly-supervised framework alleviates the burden of the imprecise and time consuming process of annotating event locations in temporal data. Unlike existing methods, in which localization is explicitly achieved by design, our model learns localization implicitly as a byproduct of learning to count instances. This unique feature is a direct consequence of the model’s theoretical properties. We validate the effectiveness of our approach in a number of experiments (drum hit and piano onset detection in audio, digit detection in images) and demonstrate performance comparable to that of fully-supervised state-of-the-art methods, despite much weaker training requirements

    Between images and built form: automating the recognition of standardised building components using deep learning

    Get PDF
    Building on the richness of recent contributions in the field, this paper presents a state-of-the-art CNN analysis method for automating the recognition of standardised building components in modern heritage buildings. At the turn of the twentieth century manufactured building components became widely advertised for specification by architects. Consequently, a form of standardisation across various typologies began to take place. During this era of rapid economic and industrialised growth, many forms of public building were erected. This paper seeks to demonstrate a method for informing the recognition of such elements using deep learning to recognise ‘families’ of elements across a range of buildings in order to retrieve and recognise their technical specifications from the contemporary trade literature. The method is illustrated through the case of Carnegie Public Libraries in the UK, which provides a unique but ubiquitous platform from which to explore the potential for the automated recognition of manufactured standard architectural components. The aim of enhancing this knowledge base is to use the degree to which these were standardised originally as a means to inform and so support their ongoing care but also that of many other contemporary buildings. Although these libraries are numerous, they are maintained at a local level and as such, their shared challenges for maintenance remain unknown to one another. Additionally, this paper presents a methodology to indirectly retrieve useful indicators and semantics, relating to emerging HBIM families, by applying deep learning to a varied range of architectural imagery

    Common Limitations of Image Processing Metrics:A Picture Story

    Get PDF
    While the importance of automatic image analysis is continuously increasing, recent meta-research revealed major flaws with respect to algorithm validation. Performance metrics are particularly key for meaningful, objective, and transparent performance assessment and validation of the used automatic algorithms, but relatively little attention has been given to the practical pitfalls when using specific metrics for a given image analysis task. These are typically related to (1) the disregard of inherent metric properties, such as the behaviour in the presence of class imbalance or small target structures, (2) the disregard of inherent data set properties, such as the non-independence of the test cases, and (3) the disregard of the actual biomedical domain interest that the metrics should reflect. This living dynamically document has the purpose to illustrate important limitations of performance metrics commonly applied in the field of image analysis. In this context, it focuses on biomedical image analysis problems that can be phrased as image-level classification, semantic segmentation, instance segmentation, or object detection task. The current version is based on a Delphi process on metrics conducted by an international consortium of image analysis experts from more than 60 institutions worldwide.Comment: This is a dynamic paper on limitations of commonly used metrics. The current version discusses metrics for image-level classification, semantic segmentation, object detection and instance segmentation. For missing use cases, comments or questions, please contact [email protected] or [email protected]. Substantial contributions to this document will be acknowledged with a co-authorshi

    Natalizumab treatment shows low cumulative probabilities of confirmed disability worsening to EDSS milestones in the long-term setting.

    Get PDF
    Abstract Background Though the Expanded Disability Status Scale (EDSS) is commonly used to assess disability level in relapsing-remitting multiple sclerosis (RRMS), the criteria defining disability progression are used for patients with a wide range of baseline levels of disability in relatively short-term trials. As a result, not all EDSS changes carry the same weight in terms of future disability, and treatment benefits such as decreased risk of reaching particular disability milestones may not be reliably captured. The objectives of this analysis are to assess the probability of confirmed disability worsening to specific EDSS milestones (i.e., EDSS scores ≥3.0, ≥4.0, or ≥6.0) at 288 weeks in the Tysabri Observational Program (TOP) and to examine the impact of relapses occurring during natalizumab therapy in TOP patients who had received natalizumab for ≥24 months. Methods TOP is an ongoing, open-label, observational, prospective study of patients with RRMS in clinical practice. Enrolled patients were naive to natalizumab at treatment initiation or had received ≤3 doses at the time of enrollment. Intravenous natalizumab (300 mg) infusions were given every 4 weeks, and the EDSS was assessed at baseline and every 24 weeks during treatment. Results Of the 4161 patients enrolled in TOP with follow-up of at least 24 months, 3253 patients with available baseline EDSS scores had continued natalizumab treatment and 908 had discontinued (5.4% due to a reported lack of efficacy and 16.4% for other reasons) at the 24-month time point. Those who discontinued due to lack of efficacy had higher baseline EDSS scores (median 4.5 vs. 3.5), higher on-treatment relapse rates (0.82 vs. 0.23), and higher cumulative probabilities of EDSS worsening (16% vs. 9%) at 24 months than those completing therapy. Among 24-month completers, after approximately 5.5 years of natalizumab treatment, the cumulative probabilities of confirmed EDSS worsening by 1.0 and 2.0 points were 18.5% and 7.9%, respectively (24-week confirmation), and 13.5% and 5.3%, respectively (48-week confirmation). The risks of 24- and 48-week confirmed EDSS worsening were significantly higher in patients with on-treatment relapses than in those without relapses. An analysis of time to specific EDSS milestones showed that the probabilities of 48-week confirmed transition from EDSS scores of 0.0–2.0 to ≥3.0, 2.0–3.0 to ≥4.0, and 4.0–5.0 to ≥6.0 at week 288 in TOP were 11.1%, 11.8%, and 9.5%, respectively, with lower probabilities observed among patients without on-treatment relapses (8.1%, 8.4%, and 5.7%, respectively). Conclusions In TOP patients with a median (range) baseline EDSS score of 3.5 (0.0–9.5) who completed 24 months of natalizumab treatment, the rate of 48-week confirmed disability worsening events was below 15%; after approximately 5.5 years of natalizumab treatment, 86.5% and 94.7% of patients did not have EDSS score increases of ≥1.0 or ≥2.0 points, respectively. The presence of relapses was associated with higher rates of overall disability worsening. These results were confirmed by assessing transition to EDSS milestones. Lower rates of overall 48-week confirmed EDSS worsening and of transitioning from EDSS score 4.0–5.0 to ≥6.0 in the absence of relapses suggest that relapses remain a significant driver of disability worsening and that on-treatment relapses in natalizumab-treated patients are of prognostic importance
    corecore